Semi-Supervised Keyword Spotting in Arabic Speech Using Self-Training Ensembles

نویسنده

  • Mohamed Mahmoud
چکیده

Arabic speech recognition suffers from the scarcity of properly labeled data. In this project, we introduce a pipeline that performs semi-supervised segmentation of audio then— after hand-labeling a small dataset—feeds labeled segments to a supervised learning framework to select, through many rounds of hyperparameter optimization, an ensemble of models to infer labels for a larger dataset; using which we improved the keyword spotter’s F1 score from 75.85% (using a baseline model) to 90.91% on a ground-truth test set. We picked the keyword na‘am (yes) to spot; we defined the system’s input as an audio file of an utterance and the output as a binary label: keyword or filler. Keywords—Arabic; Acoustic Model; Ensemble Learning; Extra Trees; Gradient Boosting; K-Nearest Neighbors; Keyword Spotting; King Saud University Arabic Speech Database; Random Forests; Segmentation; Self-Training; Semi-Supervised Learning; Speech Recognition; Support Vector Machines; West Point Arabic Speech

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Keyword spotting for self-training of BLSTM NN based handwriting recognition systems

The automatic transcription of unconstrained continuous handwritten text requires well trained recognition systems. The semi-supervised paradigm introduces the concept of not only using labeled data but also unlabeled data in the learning process. Unlabeled data can be gathered at little or not cost. Hence it has the potential to reduce the need for labeling training data, a tedious and costly ...

متن کامل

Improving semi-supervised deep neural network for keyword search in low resource languages

In this work, we investigate how to improve semi-supervised DNN for low resource languages where the initial systems may have high error rate. We propose using semi-supervised MLP features for DNN training, and we also explore using confidence to improve semi-supervised cross entropy and sequence training. The work conducted in this paper was evaluated under the IARPA Babel program for the keyw...

متن کامل

Data augmentation for low resource languages

Recently there has been interest in the approaches for training speech recognition systems for languages with limited resources. Under the IARPA Babel program such resources have been provided for a range of languages to support this research area. This paper examines a particular form of approach, data augmentation, that can be applied to these situations. Data augmentation schemes aim to incr...

متن کامل

Comparison of keyword spotting methods for searching in speech

This paper presents and discusses keyword spotting methods for searching in speech. In contrast with searching in text, the searching in speech or generally in multimedia data still represents a challenge. The aim of the paper is to present a keyword spotting (KWS) method based on a large vocabulary continuous speech recognition (LVCSR) system, based on phonetics decoder, and keyword spotting u...

متن کامل

Semi-supervised learning for speech recognition in the context of accent adaptation

Accented speech that is under-represented in the training data still suffers high Word Error Rate (WER) with state-of-the-art Automatic Speech Recognition (ASR) systems. Careful collection and transcription of training data for different accents can address this issue, but it is both time consuming and expensive. However, for many tasks such as broadcast news or voice search, it is easy to obta...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016